Indexing Variable Length Substrings for Exact and Approximate Matching

نویسندگان

  • Gonzalo Navarro
  • Leena Salmela
چکیده

We introduce two new index structures based on the q-gram index. The new structures index substrings of variable length instead of q-grams of fixed length. For both of the new indexes, we present a method based on the suffix tree to efficiently choose the indexed substrings so that each of them occurs almost equally frequently in the text. Our experiments show that the resulting indexes are up to 40% faster than the q-gram index when they use the same space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Abelian pattern matching in strings

Abelian pattern matching is a new class of pattern matching problems. In abelian patterns, the order of the characters in the substrings does not matter, e.g. the strings abbc and babc represent the same abelian pattern a+2b+c. Therefore, unlike classical pattern matching, we do not look for an exact (ordered) occurrence of a substring, rather the aim here is to find any permutation of a given ...

متن کامل

Eecient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm

A key approach in string processing algorithmics has been the labeling paradigm KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the rst optimal parallel algorithm for suux tree construction was given in SV94], the labeling paradigm was considered not to be compe...

متن کامل

Approximate String Matching with Variable Length Don ' t Care

Searching for DNA or amino acid sequences similar to a given pattern string is very important in molecular biology. In fact, a lot of programs and algorithms have been developed. Most of them are based on alignment of strings or approximate string matching. However, they do not seem to be adequate in some cases. For example, the DNA pattern TATA (known as TATA box) is a common promoter that oft...

متن کامل

Filtration Algorithms for Approximate Order-Preserving Matching

The exact order-preserving matching problem is to find all the substrings of a text T which have the same length and relative order as a pattern P . Like string maching, order-preserving matching can be generalized by allowing the match to be approximate. In approximate order-preserving matching two strings match if they have the same relative order after removing up to k elements in the same p...

متن کامل

Efficient Algorithm for δ-Approximate Jumbled Pattern Matching

The Jumbled Pattern Matching problem consists on finding substrings which can be permuted to be equal to a given pattern. Similarly the δ Approximate Jumbled Pattern Matching problem asks for substrings equivalent to a permutation of the given pattern, but allowing a vector of possible errors δ. Here we provide a new efficient solution for the δ Approximate Jumbled Pattern Matching problem usin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009